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Abstract. Models of biological coevolution have recently been proposed and studied, 
in which a species is defined by a genome in the form of a finite bitstring, and the 
interactions between species i and j are given by a fixed matrix with independent, 
randomly distributed elements Mij . A consequence of the stochastic independence is 
that species whose genotypes differ even by a single bit may have completely different 
phcnotypes, as defined by their interactions with the other species. This is clearly 
unrealistic, as closely related species should be similar in their interactions with the 
rest of the ecosystem. Here we therefore study a model, in which the My are correlated 
to a controllable degree by means of a local averaging scheme. We calculate, both 
analytically and numerically, the correlation function for matrix elements My and 
Mki versus the Hamming distance between the bitstrings representing the species. 
The agreement between the analytical and numerical calculations is excellent for 
correlations of limited range, but explainable differences arise for correlation ranges 
that are a significant fraction of the length of the bitstring. We compare long kinetic 
Monte Carlo simulations of coevolution models with uncorrelated and correlated 
interactions, respectively. In particular, we consider the probability density for the 
lifetimes of individual species. The species-lifetime distribution is close to a power 
law with an exponent near —2 over eight decades in time in both uncorrelated and 
correlated cases. The durations of quasi-steady states and power spectral densities for 
the diversity indices display noticeable differences. However, some qualitative features, 
like 1// behaviour in power spectral densities for the diversity indices, are not affected 
by the correlations in the interaction matrix. 
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1. Introduction 

During the last decade, ecological dynamics and biological evolution have become new 
areas of interest for statistical physicists pQ. This trend has merged with recent 
developments in the theory of complex networks [2j, which has drawn increased 
attention to problems in ecological dynamics, such as the evolution and stability of 
food webs [H 111 E] • 

While many models of macroevolution are formulated explicitly at evolutionary 
(macroscopic) time scales, evolutionary dynamics in nature are driven by reproduction, 
mutations, and selection at the ecological (microscopic) scale of individual organisms. 
Although this means that individual-based models of macroevolution must span a 
dauntingly large range of timescales, this is now becoming possible with the aid of 
modern computers and algorithms. We therefore recently studied an individual-based 
biological coevolution model [Tj |HJ E] with random interspecies interactions. This 
model is a simplified version of the tangled-nature model introduced by Jensen, et 
al. [TUl HH E2] • Individuals are represented by a haploid genome consisting of a string of 
bits. The individuals reproduce asexually and interact with individuals of other species 
through a random interaction matrix with stochastically independent elements, which 
determine how the reproduction probability of individuals of each species is affected by 
the presence and abundance of other species in the "ecosystem" . During reproduction, 
an offspring individual can randomly mutate with a small but constant probability 
by having a single bit in its genome flipped. These mutations create perturbations 
in the ecosystem by introducing new species. A particular mutant may or may not 
succeed, depending on its interactions with the species already present in the system. 
Usually a mutant does not fit in well with the resident species, and so it quickly 
goes extinct. However, occasionally a successful mutant can cause avalanches of mass 
extinctions, destroying some or all of the other species through predation or competition. 
As a consequence, the individuals in this model ecosystem live in a dynamic "fitness 
landscape" , which evolves due to changes in the populations of the resident species and 
the introduction of new species through mutations. The model displays quiet periods 
interrupted by bursts of high activity caused by mass extinctions that are triggered by 
the introduction of new mutants, reminiscent of the punctuated-equilibrium mode of 
evolution suggested by Eldredge and Gould [T3J EH US] ■ The lifetime distribution for 
individual species is well described by a power law with exponent close to —2. 

A rather unrealistic aspect of this model is the fact that the elements of the 
interaction matrix are completely uncorrelated. This means that a mutant in general will 
have completely different interactions with other species, and thus a completely different 
phenotype, than does its parent "wildtype" . In the present paper, we therefore modify 
our previous model by introducing correlations into the interaction matrix. (Similar 
ideas have been pursued by Kauffman [To^ ll7|.) In the modified model, the correlations 
between matrix elements ensure that mutants are not completely different from their 
parents. Rather, their interactions with the other species are similar to those of the 
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wildtype. The price to be paid is a significant reduction in the number of effectively 
independent phenotypes for a genome of a given length. In this paper we compare 
numerical results for this modified model with results for the corresponding uncorrelated 
model in order to assess the effects of including correlations. The results show that, 
although long-range correlations affect the dynamics of the system noticeably, some 
qualitative features remain unchanged. 

The organization of the rest of this paper is as follows. In section El we describe 
the model and the algorithm used for the simulations. In section El we describe the 
construction of the correlated interaction matrix and briefly discuss some complications 
of this process. In section 0] we report our results, which are summarized in sectional 
Derivations of our analytical results are provided in the Appendices. 



2. Model 



A species in our Monte Carlo (MC) simulations is represented by a bitstring genome of 
length L. This L-bit genome supplies a pool of 2 L possible species. These organisms 
are considered haploids. Thus, the terms "species", "species index", and "genotype" 
are synonymous in this paper. 

Individuals in the simulation are allowed to give birth to F offspring per generation 
with a probability Pj, where i denotes the species index. The reproduction is asexual 
(cloning), and it occurs in discrete, nonoverlapping generations. Thus, only offspring 
can survive to the next generation, and all individuals die at the end of the generation 
whether they reproduce or not. We use a mathematically convenient, nonlinear form 
for the probability of reproduction |Hl EH HH UH| , given by 

PiiMt)}) = f " T > C 1 ) 



1 + exp 



V . M ljnj (t)/N tot (t) + N tot (t)/N 



where n^t) is the population of species i at time t. The Verhulst factor N represents the 
carrying capacity of the "ecosystem" and keeps the total population, N tot (t) = £V rij(t), 
finite. M is the interaction matrix, in which a matrix element My represents the effect 
of the population density of species j on species i. Species i benefits from species j if 
My is positive, and it is inhibited by species j if My is negative. There is no interaction 
between species i and j when My is zero. The structure of M is discussed in the next 
section. 

Although we begin the simulations with a population consisting of only one species, 
the system diversifies quickly because, in each generation, all individuals are exposed 
to mutations that create new species. The mutation process is modelled by flipping a 
randomly chosen bit in the individual's genome with a probability /i per generation per 
individual. 

We note that the population dynamics used in the model is somewhat unrealistic, 
due to the absence of external energy resources and conservation of energy (or biomass) 

HI El El CHI HUE!. 
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Figure 1. Some realizations of matrix-element distributions for typical runs with 
L = 8 and n = 0, 1, 3, 5, and 7. (The distributions are not averaged over independent 
runs.) For these particular realizations, the distributions for n = 0, 1, 3, and 5 
practically overlap. The distribution for n — 7 is narrower due to effects that are 
explained in section T3. 21 



3. The Interaction Matrix 

3.1. Form of the Matrix 

The structure of the interaction matrix characterizes the dynamics of the system. 
Rikvold and Zia [7J |Hj studied the same system (except for a negligible probability 
of 0(n 2 ) for multiple mutations) using an interaction matrix with off-diagonal elements 
distributed randomly and uniformly over [—1,1] and with the diagonal elements set to 
zero. We shall call this the uniform-uncorrelated model. For reasons that will become 
clear later, we instead use a Gaussian distribution to facilitate some comparisons. We 
call this model the Gaussian-uncorrelated (or just the uncorrelated) model. 

The main focus of this paper is the effects of correlations in the interaction matrix. 
(Some preliminary results of this work were discussed in reference |18j .) The motivation 
behind the introduction of correlations is the following. In reality, the interaction 
between two species x and z should be positively correlated with the interaction between 
species y and z if x and y are closely related. Therefore, one would expect a positive 
correlation between the matrix elements M xz and M yz . 

A correlated matrix is generated using the following method. First, a random 
matrix with elements drawn from a Gaussian distribution with standard deviation 
cr = a/1/3 and mean zero is generated. (We chose this standard deviation for 
backward compatibility with earlier studies in which the matrix elements were uniformly 
distributed on [—1, 1].) Then, each matrix element in the random matrix is averaged 
over itself and its neighbours in genome space up to a Hamming distance n, where 
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n is the averaging radius. Thus, the total number of terms in the average (or the 
number of points inside the averaging "hypersphere" with radius n in genome space) 
is Z n = Ylm=o ( m) • ^ ne avera S e is multiplied by \fZ^ in order to keep the standard 
deviation of the correlated matrix elements approximately the same as for the random 
ones. This process eliminates any possible effect by a change in the shape of the 
distribution of interaction strengths. Thus, a correlated matrix element My is given 
by 

M l3 = M °ki/Vz~n, (2) 

H(ij;kl)<n 

where are the elements of the uncorrelated matrix, which are Gaussian distributed 
with standard deviation <7o, and H(ij; kl) is the Hamming distance between two matrix 
elements M?- and M^. We use the city-block metric to define the Hamming distance 
between two matrix elements. Therefore, H(ij;kl) = H(i,k) + H(j,l), where H(i,j) 
is the Hamming distance between bitstrings i and j. In fact, H(ij; kl) denotes the 
Hamming distance between the concatenated bitstrings ij and kl. (This concatenation 
notation is useful for Hamming distance calculations and will be used throughout the 
paper.) The diagonals are set to zero after the averaging process is finished. The model 
in which such a correlated interaction matrix is used is called the correlated model. 
Theoretical calculations of correlation functions are discussed in |Appendix A 

There are two modelling issues that need to be discussed here. The first one is 
the interpretation of the mutation process. As we mentioned above, in the uncorrelated 
model, matrix elements M® z and M® z are not correlated, even if y is a mutant of x. 
(We call y a mutant of x if H(x, y) — 1.) Therefore, the set of interactions between the 
mutant y and other species can be completely different than those between the wildtype 
x and the other species. This lack of relationship between the mutant and the wildtype 
is not very realistic. Therefore, the mutation process in the uncorrelated model may, 
in some sense, be interpreted as the introduction of a completely new species to the 
system from a pool of species, reminiscent of migration. In the correlated model, the 
interaction constants of the mutant and the wildtype are correlated. However, whether 
these correlations are strong enough to describe those between a real mutant and its 
wildtype is arguable. The other issue is the non-antisymmetrical form of M, meaning 
My 7^ —Mji. This feature of the model, along with the absence of an external energy 
source, gives rise to unrealistic numbers of mutualistic interactions in the system j^j . 

We have already made a preliminary comparison between the correlated model 
and the Gaussian- and uniform-uncorrelated models in reference ^H] for short-range 
correlations, n — 1 and n = 2. The results suggested that the correlated and Gaussian- 
uncorrelated models behave quite similarly, so that short-range correlations in M may 
not have a significant effect on the dynamics. In this paper, we extend our investigation 
to the effects of longer-range correlations with n > 2. 
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3.2. Distributions of Matrix Elements, Correlations Between Matrix Elements, and 
Complications in the Averaging Process 



The averaging process mentioned above causes a few unexpected complications. The 
first one is a shift in the mean. Although the matrix elements are drawn from a 
distribution with mean zero, for a finite matrix, the mean, (M£), while quite small, 
is always non-zero. The averaging alters the mean of the matrix elements, so that the 
means of the initial random and the averaged matrices can be considerably different. 
The reason for the shift is the following. Since each averaged element is multiplied by 
\[Z^ l to keep the standard deviation constant, the mean of the elements of the averaged 
interaction matrix becomes (My) = \/Z^(M?-). Although (M£) is a negligibly small 
number, multiplication by a large \fZ^ could cause a considerable shift, (My) — (M°), in 
the mean. In order to prevent this, we adjust the mean, (M y -), by subtracting (Mg)/2 2L 
from each element of the initial random matrix. This modification minimizes the shift 
in the mean. 

The other complication of the averaging process is a change in the standard 
deviation of My-, even though we intend to keep it constant. As in the problem of 
the shift in the mean mentioned above, the standard deviation of the averaged matrix 
can be considerably different than that of the initial random matrix. As seen in figure ^ 
the matrices averaged for n — 1 have approximately the same Gaussian form with a 
standard deviation close to <7o, which practically overlaps with the distribution of 
the initial Gaussian-uncorrelated matrix. However, we found that long-range averaging 
{n > L/2) could generate distributions with significantly different standard deviations, 
like the n = 7 curve included in figure H The reason for the change in the standard 
deviation is very similar to that for the shift in the mean mentioned above. This anomaly 
is discussed further in Appendix B 



We also checked the correlations between matrix elements to see the effect of 
averaging. The correlations between two matrix elements, My and Mjy, depend on 
the overlap of the M^ terms that occur in the average in equation (J2J). Theoretical 
calculations predict decaying correlation functions with steps due to the city-block 
metric, as shown in figure IA"T1 These calculations are explained in Appendix A 



The correlation functions of the matrices with short-range correlations are in good 
agreement with the theory. However, as n is increased, the numerical correlation 
functions begin to deviate from their theoretical counterparts, as shown in figure |2] 
The numerical correlation functions of the highly correlated matrices are distorted and 
translated along the y-axis, compared to the theoretical ones. We believe that the cause 
of this anomaly is related to the cause of the anomaly in the standard deviations. This 



complication is discussed in|Appendix B 
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Figure 2. Theoretical and some numerical realizations of correlation functions 
between matrix elements as functions of the Hamming distance between their indices, 
r = H(ij,kl), with L = 8; n = 1, 3, 5, and 7. The numerical correlation functions 
were not averaged over independent realizations in order to show the magnitudes and 
the character of the deviations. Theory and numerics agree for small n. Increasing 
n leads to deviations from the theory, usually retaining the overall shape of the 
expected correlation function. (Numerical calculations were performed before setting 
the diagonal of M to zero.) 



4. Simulation Results 

Unless otherwise indicated, we performed sixteen independent runs for 2 25 =33 554432 
generations for each model using parameters similar to those in references [Zj and |18j : 
genome length L = 8 bits, mutation rate /i = 10~ 3 per individual per generation, 
carrying capacity N = 2000, fecundity F = 4; and for the correlated model, averaging 
radii n = 1,3,5, and 7. Each run used a different M and a different sequence of random 
numbers. Each simulation started with 200 individuals of genotype 0. The system 
parameters were recorded every sixteen steps. 

In order to see the effect of the shape of the M?- distribution, we first ran four sets 
of simulations of the uncorrelated model with a uniform M?- distribution with standard 
deviation <t , and with a Gaussian M?- distribution with standard deviations cto/2, o"o 
and 2<To- For these systems with different Mfj distributions, we compared histograms of 
species lifetimes. The species lifetime is defined as the number of generations between 
creation and extinction of a species, during which its population is continuously positive. 

As seen in figure El the lifetime distributions of the uncorrelated systems with 
different matrix-element distributions all exhibit approximately power-law like decay 
over seven decades in time, with an exponent near —2. Figure E] also shows that a 
narrower M?- distribution flattens the concavity between 10 and 10 4 generations in the 
lifetime distribution, while a wider one makes it deeper. 

The lifetime distributions of the correlated models shown in figure |U also exhibit a 
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Figure 3. Normalized histograms of species- lifetimes for the uncorrelated model with 
a uniform matrix-element distribution with standard deviation <j , and with Gaussian 
matrix-element distribution widths, uq/2, o$, and 2<7o- Based on simulations of 2 25 
generations each with L = 8, averaged over eight runs. 

power-law like decay similar to the ones in figure El The distributions obtained from 
the simulations with very short-range correlations, like n = 1, practically overlap with 
the one corresponding to the uncorrelated model. Increasing n leads to a deviation from 
the distribution of the uncorrelated model. The n = 3,5, and 7 curves in figure 0] look 
similar to the <To/2 curve in figureEl suggesting that the change in the standard deviation 
(or the shape) of the My distribution could perhaps be responsible for this anomaly. 
In order to make sure that it is actually the correlations that cause the changes, we 
suppressed this anomaly: we ran another set of simulations for n = 7 in which the 
standard deviation of the My distribution was adjusted to do after averaging. The 
result is shown in the inset in figure EJ The two lifetime distributions in the inset almost 
completely coincide. They are both the averages of eight-run sets with n = 7 (using the 
same random number sequences), except that the My distribution of the second set is 
adjusted after averaging. The very small difference between these two curves shows that 
the effect of a change in the standard deviation of the My distribution after averaging 
is negligible. Thus, the correlations in M are largely responsible for the differences 
between the lifetime distributions of the correlated and uncorrelated models, shown in 
the main part of figure EJ 
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Figure 4. Normalized histograms of species lifetimes for uncorrelated and correlated 
models with different averaging radii, based on simulations of 2 25 generations each 
with L = 8. Each curve represents an average over sixteen runs. The standard 
deviations of the My distributions change after averaging. The averages of the new 
standard deviations are 0.5755(6), 0.570(3), 0.54(1), and 0.46(1) for n = 1, 3, 5, and 7, 
respectively, compared to ctq = 0.5773. Inset: The lifetime histograms for n = 7 with 
and without adjusting the standard deviation of the Mij, averaged over eight runs each. 
The same sets of random numbers were used for the "adjusted" eight-run set as for their 
"non-adjusted" counterparts. The distributions practically overlap, suggesting that the 
change in the standard deviation has much less effect on the lifetime distribution than 
the correlations in M. 



The deviation trend in the lifetime distributions is not monotonic in the averaging 
radius (the n = 5 curve appears below the n = 3 curve). However, the error bars 
for the n = 3 and n = 5 curves overlap. The large error margin due to the anomaly 
in the correlation functions leaves some uncertainty. Nevertheless, even fairly long- 
range correlations in M (like n = 5 and 7) do not change the gross features like the 
approximate 1/t 2 behaviour of the lifetime distributions. 

Another set of statistical measurements that we made for the durations of the quasi- 
steady states (QSS) gave a somewhat conflicting result. The QSSs are the metastable 
periods, during which the community structure appears to be stationary |H] . They 
are interrupted by active periods, during which the community structure changes after 
the emergence of a successful mutant and resulting extinction of other species. In order 
to identify the QSSs, we recorded the entropy of the system every sixteen generations. 
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The information-theoretical entropy is given by [T9*l I20|. 

S({n I (t)}) = - P/Wlnp/W, (3) 

{I\ Pl (t)>0} 

where pj(t) = ni(t) /N to t(t). The system is considered in a QSS, as long as the magnitude 
of the entropy difference, \S(t) — S(t — 16) | / 16, is below a certain threshold, which we 
took as 0.015. (For an extensive discussion on how to obtain this value, see reference [Zj.) 
The QSS-duration distribution for the uncorrelated model is not significantly affected 
by the change in the standard deviation of the M{j distribution, except the density of 
the longest lived communities around 10 7 generations (figure a)). Considering the 
large error bars at this range, the QSS distributions of the uncorrelated systems nearly 
overlap. However, the correlations in M seem to increase the lifetimes of the QSS 
communities (figure EJb) ) . 

In order to get more information on how the fluctuations in the system are affected 
by the correlations in M, we also calculated the power spectral density (PSD) of the 
Shannon- Wiener diversity index. The Shannon- Wiener diversity index [21] is defined 
as D(t) = exp [S ({ni(t)})\, where S is the information-theoretical entropy defined in 
equation (j3J). In contrast to the QSS statistics (figure Efa)) (but consistent with the 
case for the species-lifetime, figure HJ), the PSDs of the Shannon- Wiener indices of the 
uncorrelated systems with different distributions do not overlap (figureEJa)) [2~2"ll23| . 
Although the PSDs are not monotonic in the standard deviation of My, they have a 
similar 1/f like shape, indicating that the general characteristics of the system do not 
vary significantly with the M^- distribution. However, the correlations in M lead to a 
more pronounced 1/f like behaviour in the PSDs by increasing the relative weight of 
high-frequency fluctuations as the correlations grow stronger (figureEfb)). Although the 
deviations from the uncorrelated case are neither gradual nor monotonic in correlation 
strength, the result suggests that the correlations in M may in fact change some 
characteristics of the system, supporting the result obtained from the QSS duration 
statistics. Nevertheless, the qualitative 1/f behaviour of the PSDs is not affected by 
these changes. 

The results discussed above may seem rather complicated, but we nevertheless 
believe some clear effects can be discerned. 

There are clear differences between the results for the correlated and uncorrelated 
Gaussian matrix elements. The correlations further increase the proportion of very long 
QSS (figure 5b), but curiously they also lead to an increase of intermediate-lifetime 
species (figure 4). These effects do not seem very easy to reconcile, but we speculate 
that they may be due to a decoupling of the species lifetimes and the QSS durations 
that has been observed in an uncorrelated predator-prey version of this model [9]. The 
increased activity in the intermediate-time regime results in increased intensity in the 
higher- frequency half of the PSD (figure 6b). It should be noted that strong correlation 
effects are only seen when n becomes a significant proportion of the genome length L 
(here, typically n > 3). Such long-range correlations are of very limited interest for the 
biological systems, and they only become significant in the present study because we, 
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Figure 5. Normalized histograms for the duration of quasi-steady states. A cut- 
off of \S(t) — S(t — 16)|/16 = 0.015 was used to distinguish between QSS and active 
periods. Each curve represents an average over eight runs for uncorrelated models 
and sixteen runs for correlated models. Based on simulations of 2 25 generations 
each, sampled every sixteen generations, (a) Simulations for uncorrelated models with 
uniform and Gaussian Mjj distributions with the standard deviations <7o, and cr /2, ao, 
and 2cto, respectively. The distributions are very similar for the uncorrelated model, 
(b) Simulations for the uncorrelated and the correlated models with n = 1,3,5, and 
7. The Gaussian uo and the n = curves are identical. The distributions for the 
correlated systems differ from the uncorrelated one. 



for computational reasons, are working with the unrealistically short genome length, 
L = 8. With fi<L, the effects of correlations appear to be quite minor. 

5. Summary and Conclusions 

In this paper we have considered the effects of introducing correlations between the 
elements of the interaction matrix M that determines the interspecies interactions in 
an individual-based coevolution model 011113, m which individuals are represented by 
a genome in the form of a bitstring of length L. 
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Figure 6. Power spectral densities of Shannon- Wiener diversity indices for different 
models. Each curve represents an average over eight runs for the uncorrelated models 
and sixteen runs for the correlated models. Based on simulations of 2 25 generations 
each, sampled every sixteen generations. The results obtained from each run were 
also averaged over each octave to reduce the noise, (a) Simulations for uncorrelated 
models with uniform and Gaussian Mij distributions with the standard deviations (To, 
and Co/2, <to, and 2(To, respectively, (b) Simulations for uncorrelated and correlated 
models with n = 1,3, 5, and 7. The Gaussian ag and the n = curves are identical. 
The distributions for the correlated systems differ slightly from the uncorrelated one. 

The correlations were introduced by replacing each element Mf- in an uncorrelated 
interaction matrix with the average over all the elements such that the Hamming 
distance between the concatenated bitstrings ij and kl is less than or equal to n, and 
then reweighting the resulting matrix element such as to maintain the standard 
deviation of its probability density unchanged. 

In section |3] we calculated numerically for different n the correlation functions 
between modified matrix elements and Mm as functions of the Hamming distance 
between ij and kl, and we compared these with the theoretical results derived in 
|Appendix A| For short-range correlations {n < L/2) we found good agreement 
between the numerical and theoretical results, but for larger n there were significant 
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discrepancies. These discrepancies were explained in |Appendix B| 



as a result of the 



reweighting of the correlated matrix elements that was performed in order to keep their 
standard deviation approximately equal to that of the uncorrelated ones. 

Next, in section HJ we performed long kinetic Monte Carlo simulations of our 
coevolution model, both with correlated and uncorrelated interaction matrices. As 
an indicator of the similarity of the evolution processes we calculated the probability 
densities of the lifetimes of individual species in the two models. While we found 
statistically significant differences between the lifetime distributions in the two models 
(see figureHJ), in both cases the distributions stayed near a 1/t 2 power law over near seven 
decades in time. However, the distributions for the durations of QSSs (figure EJ) and 
the PSDs for the diversity index (figure |HJ) showed that the uncorrelated and correlated 
systems have some different characteristics. Nevertheless, the overall behaviour of the 
system does not seem to change drastically, since the 1/f behaviour in the PSDs remain 
the same. This indicates that the correlations in M affect the long-term behaviour of the 
system in minor, but rather complicated ways. The correlation effects are sufficiently 
mild that it is permissible to draw conclusions from simulations of uncorrelated models. 
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Appendix A. Calculating the Correlation Function 

The correlations between elements of the interaction matrix are described by the 
correlation function C(r) = (MijMki) r — {Mij){Mki). The subscript r means that 
the averages are not calculated over all matrix elements, but only over ones that are 
separated by a Hamming distance H(ij; kl) = r. Assuming the average of the matrix 
elements, (Mij), is zero, the correlation function simply becomes 



where p = H(ij;ab) and p' = H(kl;cd). By expanding the sums in the average, we 
obtain 



C(r) = (M^ Mm) r . 
Substituting equation (J2J) into equation (|A.1J) gives 



(A.l) 




(A.2) 



(M.jM^r = ((M° ab + M% + . . .)(M° cd + M° d , + ...)> /Z, 



(A.3) 
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r (Hamming distance) 

Figure Al. Theoretical correlation functions for L = 13 with n = 1, 2, 3, 5, and 7. 
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(clean part) bits (dirty part) (clean part) bits (dirty part) 



Figure A2. (a) Common elements in the two sums shown symbolically as the 
intersection of two 2L-dimensional hyperspheres in genome space with radii n. (b) 
"Dirty" and "clean" parts. See discussion in the text. 

In this average, all cross terms, (M° b M? d ) where a^c and/or b ^ d, are equal to zero. 
However, if a = c and b = d, then the term ((M® b ) 2 ) is equal to (Tq, which is the variance 
of the elements of the random interaction matrix. Therefore, the correlation function 
becomes 

C(r) = (M l3 M kl ) r = Af L , n (r)al/Z n = tf L>n (r)/3Z n , (A.4) 

where AfL,n(f) is the number of common terms in the two sums in equation (jA.2|) . The 
calculation of A/i |n (r) is explained below. 

Appendix A.l. Calculating ( r ) 

The terms in the sum in equation (j2J) are neighbours of M?- within a hypersphere of 
"radius" n, centred at ij in the 2L-dimensional genome space. So, the common terms 
in the sums in equation ()A.2j) are the ones which lie in the intersection of two 2L- 
dimensional "hyperspheres" of radii n, centred at A = ij and B = kl (figure E2fa)). 
This analogy helps us to reformulate the problem of finding the common terms in the 
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sums in equation (|A.2j) as the following. What is the total number of bitstrings that 
are n bits or less away from both bitstrings A and B, when A and B differ from each 
other by r bits? Equivalently, we want to find the number of identical bitstrings we can 
create by making n or less modifications on A and B each. 

To make it easier to visualize the solution, we arrange the two bitstrings A and B 
so that the matching bits constitute the left-hand part of each bitstring. We shall call 
these 2L — r matching bits the clean part, and the r non-matching bits the dirty part 
(figure IX2T b)). We emphasize that this arrangement is only for visualization purposes, 
and not a part of the solution. 

The contributions to A/i, n (r) can be grouped in three cases: 

a) n > r > 0: First, we make the dirty parts identical by making i modifications 
on either A or B and r — i modifications on the other. There are YH=o (Q = 
Sl=o ( -7 = 2 r ways of making the dirty parts identical. Since n > r, we can 
also make k = n — max(r — = min(n — r + i,n — 1) modifications on the 
clean part, unless k > 2L — r, in which case the number of modifications exceeds 
2L — r, the size of the clean part. Therefore, the correct upper limit for k should be 
min(2L — r,n — r + i,n — i). Thus, the total number of different configurations that 
can be obtained on the clean part is Y^^ 2L ~ r ' n ~ r+ ' i ' n ~' 1 ' > ( 2L fc _r ) • Combining these, 
we obtain the contribution to AfL,n( r ) f° r the < r < n case: 

r min(2L— r,n— r+i,n— i) , \ / \ 

8=0 k=0 v / \ / 

b) 2n > r > n: When r > n, we have to make at least r — n modifications on one of 
the dirty parts. Therefore, the contribution to ML, n { r ) for the 2n > r > n case is 

e e (;)()• < a ' 6 > 

i=r-n k=0 ^ / V / 

Note that the upper limit of the first sum is not r anymore, since we are allowed 
to make only n < r modifications in total. 

c) r > 2n: Since the two hyperspheres in figure IA2t a) cannot overlap when r > 2n, 
the contribution to ML,n{j") in this case is zero. 

We obtain the final form of JVx n (r) by combining equation (jA.5|) and equation ([A. 6)1 : 

min(n,r) min(2L— r,n— r+i,n— i) /nr \ / \ 

M Ln (r) = { £ £ ( if ^ 2n . (A.7) 

' > i=max(r-n,0) k=0 V 7 V 7 V ' 

otherwise 

A plot of the theoretical correlation functions for L = 13 with n =1, 2, 3, 5, and 7 
is shown in figure IA1I 
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Appendix B. Deviations from the Theory in Correlation Functions of 
Highly Correlated Matrices 

The theoretical calculations for the correlation function agree with the numerical 
results for matrices with short-range correlations. However, the numerically obtained 
correlation functions for highly correlated matrices differ significantly from the theory, 
as seen in figure El We here demonstrate why these deviations occur by calculating the 
variance of the correlated matrix elements (which is equal to the value of the correlation 
function at r = 0): 

C(0) = ^corr = ((Mij) 2 } — (Mij) 2 . (B.l) 

Using 

(Mij) 2 = Z n (M°) 2 (B.2) 
(this time we do not assume (M° ) = 0) and 

((Mi) 2 ) = {(M ° ) 2 + M° 00 M° 01 + ... + (Mo ,) 2 + « + M° Q1 M° 02 + ...}, (B.3) 
where S = 2 2L is the total number of matrix elements, we get 

o* m = J^i Z " (M)) 2 + M) 2 + •••)+ 2 KoKi + •••)}- Z n(M°) 2 . (B.4) 
Then we transform the terms in parentheses into averages and obtain 
o* m = ^ {sZ n ({M»f) + 2 ^" ( ^" 1 ^ MgM°) 1 < Hfa , fc;) < 2n | - Z n (M°f , (B.5) 
which simplifies to 

^coxr = ((^) 2 ) + (Zn - l)<^^>l^«,-!«)<*. " Z n (M°) 2 . (B.6) 

Although we intended to keep the variance of the correlated matrix elements constant 
through multiplication by Z n (see equation (J2J)), equation (jB.6|) indicates that the 
deviations from the intended theoretical value cr 2 , = ((M? ) 2 ) scale by Z n . Our numerical 
studies showed that for Z„ > L/2, the term {Z n — l){Mf-M^ l )i< H ^ i j.^i)<2n — Z n {Mf-) 2 can 
no longer be neglected, and deviations from the theoretical value become significant. 

We verified equation (IB~6l) by calculating, a 2 orr , ((M°-) 2 ), (MP-M^)i< H (i i;W )<2„, and 
(M? ) 2 in computer simulations for L = 7 with n = 3 and 4, over 100 runs each. Plugging 
the values for ((M?-) 2 ), (M?-Mjy)i<#(y ; jy)<2n and (M°) 2 obtained from the simulation 
into the RHS of equation ()B.6j) gives the same value for cr 2 orr as in the simulation, with 
an error of O(10 -6 ) . This result supports our theory. 

Although this calculation shows the deviation only for C(0) = cr 2 orr , deviations 
for r > should be similar due to the relatively smooth change in the overlap of the 
averaging hyperspheres. Since the overlaps of the averaging hyperspheres are similar for 
adjacent r's, the deviations also should be similar for the adjacent C(r)'s. For example, 
the deviations at C(0) and C(l) should be similar in magnitude and direction. This 
argument explains why the numerical correlation functions look like their theoretical 
counterparts translated along the y axis, rather than randomly scattered around the 
expected data points. 
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